M5 Competition Pipeline - Forecasting Sales of Walmart Products


Author: Anil Gurbuz

Date: 27 JUN 2020


Introduction

Acknowledgement

I want to start with thanks to winning team of "Favorita Groceries Sales Competition" for sharing their interesting solution which achieved highest predictive performance only using the last 3 months of the data with a very small training time and let me learn a lot from. This notebook is a adaptation of that approach to M5 forecasting competition.

What is interesting about this approach?

Instead of representing sales of each day as a row in the training data as in most of the public kernels, each series are represented as a row --similar to the given format by the competition organisers--. This is particularly advantagous for code efficiency as we don't need to use pandas "groupby" which is horribly slow especially when the number of groups are high as in this data -- 30490 groups --.

I will be creating 200+ features for the purpose of this first version of the notebook. They are mostly running statistics on sales, calendar, intermittent demand and price related features -- some are adjusted versions of features for this approach from other public kernels --.

All feature creations, and training of 28 models finishes around 20mins.

What about predictive performance?

I have created my pipeline with regualar approach and also developed this one to discover how this interesting modelling technique works. I can say that in terms of the final WRMSSE score, almost always the regular approach performed better even though it was based on only some basic features relative to this approach. For the best --that I managed to achieve-- of each approach were not so different. My best 28days model with regular approach gave LB 0.61XX whereas with this approach, I managed to get LB 0.63XX none of them use recursive features or magic multipliers. I used quite a lot of complex features that I prefered to keep it to myself until the end of the competition though I can say that they are mainly targeting to capture SNAP-sales interactions.

In terms of the performance, this version of the notebook is intended to be a starter code for those of you that are willing to explore this approach. I can easily say that it taught me a lot about how wide range of ways are possible to model a problem. I will try to explain how this model works in more detail in the upcoming steps.

How this model works?

For each series we look back from some certain points in time -- with one week gaps in between-- and derive features to describe the behaviour of the series up until that point in time.

This is actually the main difference with the regular approach. We are deriving features to explain the characteristics of the entire series up until that point in time instead of deriving features that are describing the characteristics of one single day.

Above image is showing the points in time that the features are generated. Gaps between are one week which aims to capture weekly cycle in the data. Arrows are looking back to represent that the features at that point are created by looking back in time.

Once we have our features generated, we create 28 models which learns the relationship between these features of a series and the next 1,2,3....28 days sales. Each of these models are designed to learn the relationship between the same derived features and different forecasting horizons so for each training example we have 28 different labels.

What are potential improvements?

  • Creating linear regression models to capture the trend of the series -- because the decision trees are not able to extrapolate so struggle with trend -- and adding predictions as features would definitely help as they helped to regular approach.
  • Trying different custom loss functions.
  • Predicting higher levels and reconciling to lower levels of forecast (Top-down approach).Thanks @chrisrichardmiles for suggesting that.
  • This notebook that I shared here clusters the examples according to their intermittent demand related characteristics. I used this clusters to train models seperately.

What is the main advantage of using this approach?

It is a day-by-day model -- consists of 28 seperate models-- so we have the flexibility to modify models for different days. At the same time, training is so fast that we can quickly try different ideas and see how the model performance reacts to it. I think this flexibility and pace is a rare and valuable combination by looking at the public kernels.

Ability to quickly try ideas is critical in here because most of people are struggling to create a trust worthy CV strategy --including me-- so we can search for correlation of CV and LB much more quickly. Also, if you have an existing CV that requires hours of time and you got sick of it, you may prefer to pivot your approach to this one.

Notes

  • I calculated the WRMSSE weights and constants in my local computer and loaded it to the kernel using a pickle file so the additional .pkl files are just these weights.

  • Even though code works quite fast, I created some check point like save_obj(), load_obj() functions and commented them out for those of you that want to avoid each time running the pre-processing and feature engineering parts of the code.

  • I mainly created 4 datasets in pre-processing to derive the features from. snap_df is not used in this notebook but It is useful to develop further features.

In [1]:
import wrmsse_utility as w
from datetime import date, timedelta
import gc
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
import pickle
from tqdm import tqdm
import lightgbm as lgb
import random
import warnings
warnings.filterwarnings('ignore')

def save_obj(obj, name):
    with open(  name + '.pkl', 'wb') as f:
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

# Define functions for loading pickle files during upcoming steps.
def load_weights2(name):
    with open('../input/weights2/' + name + '.pkl', 'rb') as f:
        return pickle.load(f)
    
def load_avoid_spike(name):
    with open('../input/avoid-spike/' + name + '.pkl', 'rb') as f:
        return pickle.load(f)
    
def load_from_scratch(name):
    with open('../input/from-scratch/' + name + '.pkl', 'rb') as f:
        return pickle.load(f)
    
def load_clusters(name):
    with open('../input/clusters/' + name + '.pkl', 'rb') as f:
        return pickle.load(f)

Data Preprocessing

In [2]:
train_df = pd.read_csv("../input/m5-forecasting-accuracy/sales_train_evaluation.csv")
calendar = pd.read_csv("../input/m5-forecasting-accuracy/calendar.csv")
price = pd.read_csv("../input/m5-forecasting-accuracy/sell_prices.csv")
submission = pd.read_csv("../input/m5-forecasting-accuracy/sample_submission.csv")

category_cols = ['item_id', 'dept_id', 'cat_id', 'store_id', 'state_id']
no_id_id_columns = ['item_id', 'store_id', 'cat_id', 'dept_id', 'state_id']
id_cols = ["id"] + category_cols
id_df = train_df[id_cols]  # id_df not encoded
print("running")
# Label Encoding of categorical variables
mapper = {}
for col in category_cols:
    le = LabelEncoder()
    mapper[col] = dict(zip(le.fit_transform(train_df[col]), train_df[col]))
    train_df[col] = le.fit_transform(train_df[col])

multi_indexes = train_df.set_index(category_cols).index  # multi_indexes are encoded
INDEX_ITEM = multi_indexes.get_level_values("item_id")
INDEX_DEPT = multi_indexes.get_level_values("dept_id")
INDEX_STORE = multi_indexes.get_level_values("store_id")


# Create ordered_ids_and_weights -- Weights are pre-calculated
agg_level_to_denominator = load_weights2("agg_to_denominator2")
agg_level_to_weight = load_weights2("agg_to_weight2")

final_multiplier = (agg_level_to_weight[11]) / agg_level_to_denominator[11]
final_multiplier = final_multiplier.reset_index()
final_multiplier["id"] = final_multiplier["item_id"] + "_" + final_multiplier["store_id"] + "_evaluation"
final_multiplier.drop(["item_id", "store_id"], axis=1, inplace=True)
del agg_level_to_weight, agg_level_to_denominator
gc.collect()

ordered_ids_and_weights = pd.merge(id_df, final_multiplier, on=["id"], how="left")
ordered_ids_and_weights = ordered_ids_and_weights[[0]]
ordered_ids_and_weights.index = multi_indexes
ordered_ids_and_weights = ordered_ids_and_weights.reset_index()
ordered_ids_and_weights = ordered_ids_and_weights.rename({0: "weights"}, axis=1)


# Trainset set column names
train_df.set_index(keys=id_cols, inplace=True)
start_date = date(2011, 1, 29)
train_df.columns = pd.date_range(start_date, freq="D", periods=1941)
train_df.reset_index(inplace=True)


# Calendar
calendar["date"] = pd.to_datetime(calendar.date)

# Preprocess price -- represent dates as columns
price_df = pd.merge(price, id_df, how="left", on=["item_id", "store_id"])

tmp = calendar[["wm_yr_wk", "date"]]
tmp = tmp.groupby("wm_yr_wk").agg(list).reset_index()
price_df = pd.merge(price_df, tmp, how="left", on="wm_yr_wk")
price_df = price_df.explode("date")
price_df.drop(["wm_yr_wk"], axis=1, inplace=True)

price_df = price_df.set_index(id_cols + ["date"])
price_df = price_df[["sell_price"]].unstack()
price_df.columns = price_df.columns.droplevel()
price_df.reset_index(inplace=True)


# Preprocess Calendar and SNAP
tmp = calendar[["date", "event_name_1"]]


tmp2 = tmp.dropna(axis=0)
calendar_df = pd.DataFrame(columns=pd.date_range(start_date, freq="D", periods=1913 + 56),
                           index=tmp2.event_name_1.unique())
tmp3 = tmp2.groupby("event_name_1").agg(list).reset_index()
a = zip(tmp3["event_name_1"], tmp3["date"])
for row, col in a: calendar_df.loc[row, col] = 1

snap_ca = pd.DataFrame(index=["CA"], columns=pd.date_range(start_date, freq="D", periods=1913 + 56))
snap_tx = pd.DataFrame(index=["TX"], columns=pd.date_range(start_date, freq="D", periods=1913 + 56))
snap_wa = pd.DataFrame(index=["WI"], columns=pd.date_range(start_date, freq="D", periods=1913 + 56))

snap_ca.loc["CA", :] = calendar["snap_CA"].values
snap_tx.loc["TX", :] = calendar["snap_TX"].values
snap_wa.loc["WI", :] = calendar["snap_WI"].values
snap_df = pd.concat([snap_ca, snap_tx, snap_wa])

calendar_df = calendar_df.fillna(0)

#Remove Repeating records in Calendar and create seperate series for NBA and Ramadan
calendar_df = calendar_df.loc[calendar_df.index!= "LentWeek2",]
calendar_df.iloc[-1,:] = calendar_df.iloc[6,:].values + calendar_df.iloc[-1,:].values # Easter Fix
calendar_df.iloc[-6,:] = calendar_df.iloc[-3,:].values + calendar_df.iloc[-6,:].values # Christmas Fix

NBA = (calendar_df.iloc[11,:].values + calendar_df.iloc[12,:].values).cumsum() %2
NBA = pd.Series(NBA, index=calendar_df.columns)
RAMADAN = (calendar_df.iloc[15,:].values + calendar_df.iloc[16,:].values).cumsum() %2
RAMADAN = pd.Series(RAMADAN, index=calendar_df.columns)

calendar_df = calendar_df.loc[~calendar_df.index.isin(["OrthodoxEaster","OrthodoxChristmas","NBAFinalsStart",\
                                                       "NBAFinalsEnd","Ramadan starts","Eid al-Fitr"]),]

event_mapper = dict(zip(list(range(1,24)),list(calendar_df.index)))
calendar_df = ((np.arange(1,24) * calendar_df.T).T).sum()


mapper_back_state = {v: k for k, v in mapper["state_id"].items()}
snap_df.index = snap_df.index.map(mapper_back_state)


del snap_ca, snap_tx, snap_wa, tmp, tmp2, tmp3, calendar, price, mapper_back_state

print("Dataframes are ready...")
running
Dataframes are ready...

Feature Engineering

In [3]:
# Select only the part of the data that will be used in training and feature engineering
def sample_from_train(train_df, start, end):
    return train_df[ id_cols + list(pd.date_range(start, end))]

# Select the data between minus days before the date_from and date_from
def get_timespan(df, date_from, minus, periods, freq="D"):
    return df[pd.date_range(date_from - timedelta(days=minus), periods=periods, freq=freq)]


# Create sales related features
def fe(df, date_from, get_label=True, name_prefix=None):
    X = dict()
    
    # Windows size 3,7,30,180 -- Selected by intuition and trail
    for i in [3, 7, 30, 180]:
        tmp = get_timespan(df, date_from, i, i)
        X['diff_%s_mean' % i] = tmp.diff(axis=1).mean(axis=1).values
        X['mean_%s_decay' % i] = (tmp * np.power(0.9, np.arange(i)[::-1])).sum(axis=1).values
        X['mean_%s' % i] = tmp.mean(axis=1).values
        X['median_%s' % i] = tmp.median(axis=1).values
        X['min_%s' % i] = tmp.min(axis=1).values
        X['max_%s' % i] = tmp.max(axis=1).values
        X['std_%s' % i] = tmp.std(axis=1).values
    
    # Windows size 3,7,15,30 -- Selected by intuition and trail
    for i in [3, 7, 15, 30]:
        tmp = get_timespan(df, date_from + timedelta(days=-7), i, i)
        X['diff_%s_mean_2' % i] = tmp.diff(axis=1).mean(axis=1).values
        X['mean_%s_decay_2' % i] = (tmp * np.power(0.9, np.arange(i)[::-1])).sum(axis=1).values
        X['mean_%s_2' % i] = tmp.mean(axis=1).values
        X['median_%s_2' % i] = tmp.median(axis=1).values
        X['min_%s_2' % i] = tmp.min(axis=1).values
        X['max_%s_2' % i] = tmp.max(axis=1).values
        X['std_%s_2' % i] = tmp.std(axis=1).values
    
    # Windows size 3,7,13,30,180 -- Selected by intuition and trail
    for i in [3, 7, 14, 30, 180]:
        tmp = get_timespan(df, date_from, i, i)
        X['has_sales_days_in_last_%s' % i] = (tmp > 0).sum(axis=1).values
        X['last_has_sales_day_in_last_%s' % i] = i - ((tmp > 0) * np.arange(i)).max(axis=1).values
        X['first_has_sales_day_in_last_%s' % i] = ((tmp > 0) * np.arange(i, 0, -1)).max(axis=1).values
        X['Number_of_days_to_max_sales_in_last_%s' % i] = (pd.to_datetime(date_from) - pd.to_datetime(
            get_timespan(df, date_from, i, i).idxmax(axis=1).values)).days.values
        X['Number_of_days_to_min_sales_in_last_%s' % i] = (pd.to_datetime(date_from) - pd.to_datetime(
            get_timespan(df, date_from, i, i).idxmin(axis=1).values)).days.values

    # Lag features of last 7 days
    for i in range(1, 8):
        X["lag_%s" % i] = get_timespan(df, date_from, i, 1).values.ravel()
    
    # Weekday avg. of 4,8,13,26,52 weeks windows
    for i in range(7):
        X["mean_4_dow_%s" % i] = get_timespan(df, date_from, 4 * 7 - i, 4, freq="7D").mean(axis=1).values
        X["mean_8_dow_%s" %i] = get_timespan(df, date_from, 8 * 7 - i, 8, freq="7D").mean(axis=1).values
        X["mean_13_dow_%s" % i] = get_timespan(df, date_from, 13 * 7 - i, 13, freq="7D").mean(axis=1).values
        X["mean_26_dow_%s" % i] = get_timespan(df, date_from, 26 * 7 - i, 26, freq="7D").mean(axis=1).values
        X["mean_52_dow_%s" % i] = get_timespan(df, date_from, 52 * 7 - i, 52, freq="7D").mean(axis=1).values

    X = pd.DataFrame(X)

    if name_prefix is not None:
        X.columns = ['%s_%s' % (name_prefix, c) for c in X.columns]
        return X
    
    # LABELS ARE SALES VALUES OF NEXT 28 DAYS
    if get_label:
        y = df[pd.date_range(date_from, periods=28)].values
        return X, y
    else:
        return X
    
    
# Create event related features
def calendar_fe(calendar_df, date_from, number_of_series=30490, add_bulk=False):
    X={}

    for i in [7, 14, 28]:
        tmp = get_timespan(calendar_df, date_from, i, i)
        if ((tmp > 0) * np.arange(i)).max() == 0:
            X["days_after_last_event_in_last_%s_days" %i] = np.repeat(np.NaN,number_of_series)
        else:
            X["days_after_last_event_in_last_%s_days" %i] = np.repeat(i - ((tmp > 0) * np.arange(i)).max(), number_of_series)


    for i in [7, 14, 28]:
        tmp = get_timespan(calendar_df, date_from + timedelta(days=i), i, i)
        
        if ((tmp > 0 ) * np.arange(i, 0, -1)).max() ==0:
            X["days_to_NEXT_event_in_%s"%i] = np.repeat(np.NaN,number_of_series)
        else:
            X["days_to_NEXT_event_in_%s"%i] = np.repeat(i - ((tmp > 0 ) * np.arange(i, 0, -1)).max(), number_of_series)
    
    X = pd.DataFrame(X)
    
    nba_tmp = get_timespan(NBA, date_from + timedelta(days=28), 28, 28)
    nba_tmp = np.array([nba_tmp]*30490)
    nba_tmp = pd.DataFrame(nba_tmp, columns=["NBA_AT_NEXT_%s_CATEGORICAL"%i for i in range(1,29)])
    
    ramadan_tmp = get_timespan(RAMADAN, date_from + timedelta(days=28), 28, 28)
    ramadan_tmp = np.array([ramadan_tmp]*30490)
    ramadan_tmp = pd.DataFrame(ramadan_tmp, columns=["RAMADAN_AT_NEXT_%s_CATEGORICAL"%i for i in range(1,29)])
    
    cal_tmp = get_timespan(calendar_df, date_from + timedelta(days=28), 28, 28)
    cal_tmp = np.array([cal_tmp]*30490)
    cal_tmp = pd.DataFrame(cal_tmp, columns=["EVENT_AT_NEXT_%s_CATEGORICAL"%i for i in range(1,29)])
    
    X = pd.concat([X,nba_tmp,ramadan_tmp,cal_tmp],axis=1)
    return X


def price_fe(price_df, date_from):
    X={}
    for i in [28]:
        tmp = get_timespan(price_df, date_from + timedelta(days=i), i, i)
        X["max_price_NEXT_%s_days" %i] = tmp.max(axis=1).values
        X["min_price_NEXT_%s_days" % i] = tmp.min(axis=1).values
        X["days_to_firt_price_drop_in_NEXT_%s_days_CATEGORICAL"] = (((tmp.diff(axis=1)<0)*np.arange(i)).max(axis=1)).replace(0,np.NaN)
        X["days_to_firt_price_increase_in_NEXT_%s_days_CATEGORICAL"] = (((tmp.diff(axis=1)>0)*np.arange(i)).max(axis=1)).replace(0,np.NaN)


    for i in [7,28,180]:
        tmp = get_timespan(price_df, date_from, i, i)
        X["days_after_last_price_drop_%s_days"%i] = (i - ((tmp.diff(axis=1)<0)*np.arange(i)).max(axis=1)).replace(i,np.NaN)
        X["days_after_last_price_increse_%s_days"%i] = (i - ((tmp.diff(axis=1)>0)*np.arange(i)).max(axis=1)).replace(i,np.NaN)
        
        X["max_price_last_%s_days" % i] = tmp.max(axis=1).values
        X["min_price_last_%s_days" % i] = tmp.min(axis=1).values
        X["percent_price_change_last_%s_days" % i] = (X["max_price_last_%s_days" % i] - X[
            "min_price_last_%s_days" % i]) / X["min_price_last_%s_days" % i]
        X["price_NA_last_%s_days" % i] = tmp.isna().sum(axis=1).values

    X = pd.DataFrame(X)
    return X

Create train validation and test sets

In [4]:
def create_train_and_val_as_list_of_df(train_df, item_df, store_dept_df, calendar_df, price_df, multi_indexes, val_from, number_of_weeks):
    X_l = []
    y_l = []
    weights = ordered_ids_and_weights.copy()

    for i in tqdm(range(number_of_weeks), leave=False):
        dt_from = val_from + timedelta(days=- i*7)
        
        X, y = fe(train_df, dt_from, get_label=True)
        X_item = fe(item_df, dt_from, get_label=False, name_prefix="Item")
        X_item = X_item.reindex(INDEX_ITEM).reset_index(drop=True)
        X_store_dept = fe(store_dept_df, dt_from, get_label=False, name_prefix="Store_Dept")
        X_store_dept.index = original_store_dept_index
        X_store_dept = X_store_dept.reindex(pd.MultiIndex.from_arrays([INDEX_STORE,INDEX_DEPT])).reset_index(drop=True)
        
        X_calendar = calendar_fe(calendar_df, dt_from)
        X_price = price_fe(price_df, dt_from)

        weights["weights"] *= (0.997**i)
        X_l.append(w.reduce_mem_usage(pd.concat([X, X_item, X_store_dept, X_calendar, X_price, weights], axis=1)))
        y_l.append(y)

    return X_l, y_l

# Create same features this time to describe the status series just before the test start date
def create_test_as_df(train_df, item_df, store_dept_df, calendar_df, price_df, multi_indexes, test_from):
    X_test = fe(train_df, test_from, get_label=False)
    X_item = fe(item_df, test_from, get_label=False, name_prefix="Item")
    X_item = X_item.reindex(INDEX_ITEM).reset_index(drop=True)
    X_store_dept = fe(store_dept_df, test_from, get_label=False, name_prefix="Store_Dept")
    X_store_dept.index = original_store_dept_index
    X_store_dept = X_store_dept.reindex(pd.MultiIndex.from_arrays([INDEX_STORE,INDEX_DEPT])).reset_index(drop=True)
    
    X_calendar = calendar_fe(calendar_df, test_from)
    X_price = price_fe(price_df, test_from)

    return pd.concat([X_test, X_item, X_store_dept, X_calendar, X_price, ordered_ids_and_weights], axis=1)



# define cost function -- From Ragnar's 
def custom_asymmetric_train(y_pred, y_true):
    y_true = y_true.get_label()
    residual = (y_true - y_pred).astype("float")
    grad = np.where(residual < 0, -2 * residual, -2 * residual * 1.10)
    hess = np.where(residual < 0, 2, 2 * 1.10)
    return grad, hess
In [5]:
train_start = date(2013,1,1)
train_end = date(2016, 4, 24)
validation_start = date(2016, 4, 25)
validation_end = date(2016, 5, 22)
test_start =  date(2016, 5, 23)
test_end =  date(2016, 6, 19)

number_of_weeks = 60

#### IF wanted to try CV on last years periods --> validation weeks will be popped with [48, 100] ####
###  pd.date_range includes both sides when start and end given ###
###  only includes start and adds periods when used with periods and freq ###

# Select fresh examples
train_df = sample_from_train(train_df, train_start, validation_end)
price_df = sample_from_train(price_df, train_start, test_end)
item_df = train_df.groupby(["item_id"]).sum().iloc[:,4:]
store_dept_df = train_df.groupby(["store_id","dept_id"]).sum().iloc[:,3:]
original_store_dept_index = store_dept_df.index


# Create train, labels  and test
X_l, y_l = create_train_and_val_as_list_of_df(train_df, item_df, store_dept_df, calendar_df, price_df,\
                                              multi_indexes, validation_start, number_of_weeks)


test_X = create_test_as_df(train_df, item_df, store_dept_df, calendar_df, price_df, multi_indexes, test_start)


val_X = X_l.pop(0)
val_y = y_l.pop(0)


#Make train_X and train labels
train_X = pd.concat(X_l, axis=0)
train_y = np.concatenate(y_l, axis=0)
del X_l,y_l
gc.collect()


excld_features = ["Store_Dept_last_has_sales_day_in_last_3",
"Store_Dept_has_sales_days_in_last_7",
"Store_Dept_last_has_sales_day_in_last_7",
"Store_Dept_first_has_sales_day_in_last_7",
"Store_Dept_last_has_sales_day_in_last_14",
"Store_Dept_first_has_sales_day_in_last_14",
"Store_Dept_last_has_sales_day_in_last_30",
"Store_Dept_first_has_sales_day_in_last_30",
"Store_Dept_last_has_sales_day_in_last_180",
"Store_Dept_first_has_sales_day_in_last_180",
"NBA_AT_NEXT_1_CATEGORICAL",
"NBA_AT_NEXT_2_CATEGORICAL",
"NBA_AT_NEXT_3_CATEGORICAL",
"NBA_AT_NEXT_4_CATEGORICAL",
"NBA_AT_NEXT_5_CATEGORICAL",
"NBA_AT_NEXT_6_CATEGORICAL",
"NBA_AT_NEXT_7_CATEGORICAL",
"NBA_AT_NEXT_8_CATEGORICAL",
"NBA_AT_NEXT_9_CATEGORICAL",
"NBA_AT_NEXT_10_CATEGORICAL",
"NBA_AT_NEXT_11_CATEGORICAL",
"NBA_AT_NEXT_12_CATEGORICAL",
"NBA_AT_NEXT_13_CATEGORICAL",
"NBA_AT_NEXT_14_CATEGORICAL",
"NBA_AT_NEXT_15_CATEGORICAL",
"NBA_AT_NEXT_16_CATEGORICAL",
"NBA_AT_NEXT_17_CATEGORICAL",
"NBA_AT_NEXT_18_CATEGORICAL",
"NBA_AT_NEXT_19_CATEGORICAL",
"NBA_AT_NEXT_20_CATEGORICAL",
"NBA_AT_NEXT_21_CATEGORICAL",
"NBA_AT_NEXT_22_CATEGORICAL",
"NBA_AT_NEXT_23_CATEGORICAL",
"NBA_AT_NEXT_24_CATEGORICAL",
"NBA_AT_NEXT_25_CATEGORICAL",
"NBA_AT_NEXT_26_CATEGORICAL",
"NBA_AT_NEXT_27_CATEGORICAL",
"NBA_AT_NEXT_28_CATEGORICAL",
"RAMADAN_AT_NEXT_1_CATEGORICAL",
"RAMADAN_AT_NEXT_2_CATEGORICAL",
"RAMADAN_AT_NEXT_3_CATEGORICAL",
"RAMADAN_AT_NEXT_4_CATEGORICAL",
"RAMADAN_AT_NEXT_5_CATEGORICAL",
"RAMADAN_AT_NEXT_6_CATEGORICAL",
"RAMADAN_AT_NEXT_7_CATEGORICAL",
"RAMADAN_AT_NEXT_8_CATEGORICAL",
"RAMADAN_AT_NEXT_9_CATEGORICAL",
"RAMADAN_AT_NEXT_10_CATEGORICAL",
"RAMADAN_AT_NEXT_11_CATEGORICAL",
"RAMADAN_AT_NEXT_12_CATEGORICAL",
"RAMADAN_AT_NEXT_13_CATEGORICAL",
"RAMADAN_AT_NEXT_14_CATEGORICAL",
"RAMADAN_AT_NEXT_15_CATEGORICAL",
"RAMADAN_AT_NEXT_16_CATEGORICAL",
"RAMADAN_AT_NEXT_17_CATEGORICAL",
"RAMADAN_AT_NEXT_18_CATEGORICAL",
"RAMADAN_AT_NEXT_19_CATEGORICAL",
"RAMADAN_AT_NEXT_20_CATEGORICAL",
"RAMADAN_AT_NEXT_21_CATEGORICAL",
"RAMADAN_AT_NEXT_22_CATEGORICAL",
"RAMADAN_AT_NEXT_23_CATEGORICAL",
"RAMADAN_AT_NEXT_24_CATEGORICAL",
"RAMADAN_AT_NEXT_25_CATEGORICAL",
"RAMADAN_AT_NEXT_26_CATEGORICAL",
"RAMADAN_AT_NEXT_27_CATEGORICAL",
"RAMADAN_AT_NEXT_28_CATEGORICAL",
"EVENT_AT_NEXT_2_CATEGORICAL",
"EVENT_AT_NEXT_6_CATEGORICAL",
"EVENT_AT_NEXT_9_CATEGORICAL",
"EVENT_AT_NEXT_16_CATEGORICAL",
"EVENT_AT_NEXT_18_CATEGORICAL",
"EVENT_AT_NEXT_21_CATEGORICAL",
"EVENT_AT_NEXT_23_CATEGORICAL",
"EVENT_AT_NEXT_25_CATEGORICAL",
"EVENT_AT_NEXT_26_CATEGORICAL",
"EVENT_AT_NEXT_28_CATEGORICAL",
"days_to_firt_price_drop_in_NEXT_%s_days_CATEGORICAL",
"days_to_firt_price_increase_in_NEXT_%s_days_CATEGORICAL",
"days_after_last_price_increse_7_days",
"price_NA_last_7_days",
"price_NA_last_28_days",
"dept_id"]

train_X = train_X.drop(excld_features, axis=1)
val_X = val_X.drop(excld_features, axis=1)
test_X = test_X.drop(excld_features, axis=1)


features = [col for col in val_X.columns if col != "weights"]

category_features = [col for col in features if "_CATEGORICAL" in col]

submission_name="submission.csv"
print("Number of Features: ", len(features))
gc.collect()
  2%|▏         | 1/60 [00:05<05:36,  5.71s/it]
Mem. usage decreased to 23.96 Mb (78.9% reduction)
  3%|▎         | 2/60 [00:11<05:26,  5.63s/it]
Mem. usage decreased to 23.96 Mb (78.9% reduction)
  5%|▌         | 3/60 [00:16<05:20,  5.63s/it]
Mem. usage decreased to 24.08 Mb (78.7% reduction)
  7%|▋         | 4/60 [00:22<05:13,  5.59s/it]
Mem. usage decreased to 23.96 Mb (78.9% reduction)
  8%|▊         | 5/60 [00:27<05:09,  5.62s/it]
Mem. usage decreased to 23.96 Mb (78.9% reduction)
 10%|█         | 6/60 [00:33<05:07,  5.70s/it]
Mem. usage decreased to 23.32 Mb (79.4% reduction)
 12%|█▏        | 7/60 [00:40<05:21,  6.06s/it]
Mem. usage decreased to 23.93 Mb (78.9% reduction)
 13%|█▎        | 8/60 [00:48<05:38,  6.51s/it]
Mem. usage decreased to 23.93 Mb (78.9% reduction)
 15%|█▌        | 9/60 [00:56<05:50,  6.88s/it]
Mem. usage decreased to 24.13 Mb (78.7% reduction)
 17%|█▋        | 10/60 [01:03<05:59,  7.18s/it]
Mem. usage decreased to 23.99 Mb (78.8% reduction)
 18%|█▊        | 11/60 [01:09<05:32,  6.78s/it]
Mem. usage decreased to 23.32 Mb (79.4% reduction)
 20%|██        | 12/60 [01:15<05:12,  6.50s/it]
Mem. usage decreased to 23.32 Mb (79.4% reduction)
 22%|██▏       | 13/60 [01:21<04:56,  6.31s/it]
Mem. usage decreased to 23.58 Mb (79.2% reduction)
 23%|██▎       | 14/60 [01:27<04:42,  6.14s/it]
Mem. usage decreased to 23.70 Mb (79.1% reduction)
 25%|██▌       | 15/60 [01:35<04:58,  6.63s/it]
Mem. usage decreased to 23.38 Mb (79.4% reduction)
 27%|██▋       | 16/60 [01:40<04:42,  6.43s/it]
Mem. usage decreased to 23.35 Mb (79.4% reduction)
 28%|██▊       | 17/60 [01:46<04:27,  6.23s/it]
Mem. usage decreased to 23.00 Mb (79.7% reduction)
 30%|███       | 18/60 [01:54<04:34,  6.54s/it]
Mem. usage decreased to 23.06 Mb (79.6% reduction)
 32%|███▏      | 19/60 [01:59<04:17,  6.29s/it]
Mem. usage decreased to 23.64 Mb (79.1% reduction)
 33%|███▎      | 20/60 [02:05<04:06,  6.17s/it]
Mem. usage decreased to 23.87 Mb (78.9% reduction)
 35%|███▌      | 21/60 [02:11<03:55,  6.04s/it]
Mem. usage decreased to 23.76 Mb (79.0% reduction)
 37%|███▋      | 22/60 [02:17<03:45,  5.93s/it]
Mem. usage decreased to 23.76 Mb (79.0% reduction)
 38%|███▊      | 23/60 [02:22<03:36,  5.84s/it]
Mem. usage decreased to 23.61 Mb (79.2% reduction)
 40%|████      | 24/60 [02:28<03:31,  5.86s/it]
Mem. usage decreased to 23.73 Mb (79.1% reduction)
 42%|████▏     | 25/60 [02:34<03:23,  5.81s/it]
Mem. usage decreased to 23.70 Mb (79.1% reduction)
 43%|████▎     | 26/60 [02:40<03:26,  6.06s/it]
Mem. usage decreased to 23.67 Mb (79.1% reduction)
 45%|████▌     | 27/60 [02:46<03:16,  5.94s/it]
Mem. usage decreased to 23.87 Mb (78.9% reduction)
 47%|████▋     | 28/60 [02:52<03:10,  5.96s/it]
Mem. usage decreased to 23.84 Mb (79.0% reduction)
 48%|████▊     | 29/60 [03:00<03:22,  6.52s/it]
Mem. usage decreased to 23.84 Mb (79.0% reduction)
 50%|█████     | 30/60 [03:06<03:09,  6.31s/it]
Mem. usage decreased to 23.84 Mb (79.0% reduction)
 52%|█████▏    | 31/60 [03:11<02:55,  6.05s/it]
Mem. usage decreased to 23.87 Mb (78.9% reduction)
 53%|█████▎    | 32/60 [03:17<02:46,  5.93s/it]
Mem. usage decreased to 23.90 Mb (78.9% reduction)
 55%|█████▌    | 33/60 [03:22<02:37,  5.85s/it]
Mem. usage decreased to 23.99 Mb (78.8% reduction)
 57%|█████▋    | 34/60 [03:30<02:46,  6.40s/it]
Mem. usage decreased to 24.11 Mb (78.7% reduction)
 58%|█████▊    | 35/60 [03:36<02:34,  6.18s/it]
Mem. usage decreased to 24.37 Mb (78.5% reduction)
 60%|██████    | 36/60 [03:42<02:25,  6.05s/it]
Mem. usage decreased to 24.57 Mb (78.3% reduction)
 62%|██████▏   | 37/60 [03:49<02:30,  6.54s/it]
Mem. usage decreased to 24.51 Mb (78.4% reduction)
 63%|██████▎   | 38/60 [03:55<02:18,  6.28s/it]
Mem. usage decreased to 24.72 Mb (78.2% reduction)
 65%|██████▌   | 39/60 [04:01<02:08,  6.12s/it]
Mem. usage decreased to 24.72 Mb (78.2% reduction)
 67%|██████▋   | 40/60 [04:06<01:59,  5.99s/it]
Mem. usage decreased to 24.48 Mb (78.4% reduction)
 68%|██████▊   | 41/60 [04:12<01:51,  5.85s/it]
Mem. usage decreased to 24.51 Mb (78.4% reduction)
 70%|███████   | 42/60 [04:20<01:55,  6.39s/it]
Mem. usage decreased to 24.37 Mb (78.5% reduction)
 72%|███████▏  | 43/60 [04:27<01:53,  6.69s/it]
Mem. usage decreased to 24.16 Mb (78.7% reduction)
 73%|███████▎  | 44/60 [04:32<01:41,  6.35s/it]
Mem. usage decreased to 23.67 Mb (79.1% reduction)
 75%|███████▌  | 45/60 [04:38<01:31,  6.10s/it]
Mem. usage decreased to 23.67 Mb (79.1% reduction)
 77%|███████▋  | 46/60 [04:45<01:27,  6.26s/it]
Mem. usage decreased to 23.76 Mb (79.0% reduction)
 78%|███████▊  | 47/60 [04:50<01:17,  5.99s/it]
Mem. usage decreased to 23.93 Mb (78.9% reduction)
 80%|████████  | 48/60 [04:55<01:10,  5.83s/it]
Mem. usage decreased to 23.81 Mb (79.0% reduction)
 82%|████████▏ | 49/60 [05:01<01:02,  5.70s/it]
Mem. usage decreased to 23.61 Mb (79.2% reduction)
 83%|████████▎ | 50/60 [05:08<01:02,  6.26s/it]
Mem. usage decreased to 23.79 Mb (79.0% reduction)
 85%|████████▌ | 51/60 [05:14<00:54,  6.03s/it]
Mem. usage decreased to 23.81 Mb (79.0% reduction)
 87%|████████▋ | 52/60 [05:22<00:52,  6.51s/it]
Mem. usage decreased to 23.81 Mb (79.0% reduction)
 88%|████████▊ | 53/60 [05:27<00:43,  6.16s/it]
Mem. usage decreased to 24.05 Mb (78.8% reduction)
 90%|█████████ | 54/60 [05:32<00:35,  5.99s/it]
Mem. usage decreased to 24.02 Mb (78.8% reduction)
 92%|█████████▏| 55/60 [05:40<00:31,  6.39s/it]
Mem. usage decreased to 23.76 Mb (79.0% reduction)
 93%|█████████▎| 56/60 [05:45<00:24,  6.16s/it]
Mem. usage decreased to 23.38 Mb (79.4% reduction)
 95%|█████████▌| 57/60 [05:52<00:18,  6.17s/it]
Mem. usage decreased to 23.52 Mb (79.2% reduction)
 97%|█████████▋| 58/60 [05:57<00:11,  5.96s/it]
Mem. usage decreased to 23.55 Mb (79.2% reduction)
 98%|█████████▊| 59/60 [06:03<00:05,  5.86s/it]
Mem. usage decreased to 23.61 Mb (79.2% reduction)
                                               
Mem. usage decreased to 23.49 Mb (79.3% reduction)
Number of Features:  404
Out[5]:
0

Model Training

In [6]:
submission = pd.read_csv("../input/m5-forecasting-accuracy/sample_submission.csv")
df_val_pred = submission.iloc[:30490, 1:].copy()
submission = submission.iloc[30490:, 1:].reset_index(drop=True)

# Pre determined clusters. Generated in the given link here:
# https://www.kaggle.com/anlgrbz/clustering-with-intermittent-demand-related-featrs
id0 = load_clusters("id_gm0")
id1 = load_clusters("id_gm1")
id2 = load_clusters("id_gm2")

idx0 = id0.index
idx1 = id1.index
idx2 = id2.index

train_y = pd.DataFrame(train_y,index=train_X.index)
val_y = pd.DataFrame(val_y,index=val_X.index)

# Training configurations. Using early stopping by setting last 28 days as validation set.
# For final model to use, this early stopping can be changed with observed number of iterations 
# and add the last 28 days to training set as it includes crucial information.
params = {
    'boosting_type': 'gbdt',
    'metric': 'rmse',
    'objective': 'custom',
    #"tweedie_variance_power": 1.1,
    'n_jobs': -1,
    'seed': 236,
    "num_leaves": 63,
    'learning_rate': 0.1,
    'bagging_fraction': 0.75,
    'bagging_freq': 10,
    'colsample_bytree': 0.6,
    "num_boost_round": 2500,
    "early_stopping_rounds": 50,
    "min_data_in_leaf": 30}

# Model training loop for 28 different horizons
for i in range(28):
    print("=" * 50)
    print("Fold%s" % (i + 1))
    print("=" * 50)
    
    # For each horizon, there will be 3 models to trian similar products together based on their intermitten demand charachteristics.
    for j, idx in enumerate([idx0,idx1,idx2]):
        
        # Create LightGBM datasets for current fold
        train_set = lgb.Dataset(train_X.loc[train_X.index.isin(idx),features], 
                                train_y.loc[train_y.index.isin(idx), i], 
                                weight=train_X.loc[train_X.index.isin(idx),"weights"],
                                categorical_feature=category_features)
        
        val_set = lgb.Dataset(val_X.loc[val_X.index.isin(idx),features], 
                              val_y.loc[val_y.index.isin(idx), i], 
                              weight=val_X.loc[val_X.index.isin(idx),"weights"],
                              categorical_feature=category_features)
        
        # Train model 
        model = lgb.train(params, train_set, valid_sets= [train_set,val_set], verbose_eval=50, fobj=custom_asymmetric_train)
        
        # Make validation set predictions
        val_preds = model.predict(val_X.loc[val_X.index.isin(idx),features])
        df_val_pred.loc[df_val_pred.index.isin(idx), "F%s"%(i+1)] = val_preds
        
        # Make predictions on test data and save into submission file
        submission.loc[submission.index.isin(idx),"F%s"%(i+1)] = model.predict(test_X.loc[test_X.index.isin(idx),features])

submission.to_csv(submission_name, index=False)
==================================================
Fold1
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.98368	valid_1's rmse: 2.03996
Early stopping, best iteration is:
[27]	training's rmse: 2.08218	valid_1's rmse: 2.02511
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.22312	valid_1's rmse: 1.39048
[100]	training's rmse: 1.15617	valid_1's rmse: 1.39744
Early stopping, best iteration is:
[52]	training's rmse: 1.2192	valid_1's rmse: 1.38964
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.22243	valid_1's rmse: 2.42598
[100]	training's rmse: 2.0771	valid_1's rmse: 2.41697
Early stopping, best iteration is:
[80]	training's rmse: 2.12489	valid_1's rmse: 2.4094
==================================================
Fold2
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.90443	valid_1's rmse: 2.01478
[100]	training's rmse: 1.80182	valid_1's rmse: 2.02722
Early stopping, best iteration is:
[62]	training's rmse: 1.87692	valid_1's rmse: 1.98363
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.20968	valid_1's rmse: 1.47589
[100]	training's rmse: 1.13617	valid_1's rmse: 1.45181
[150]	training's rmse: 1.08513	valid_1's rmse: 1.46266
Early stopping, best iteration is:
[101]	training's rmse: 1.13509	valid_1's rmse: 1.45086
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.17061	valid_1's rmse: 2.21149
Early stopping, best iteration is:
[46]	training's rmse: 2.18838	valid_1's rmse: 2.2077
==================================================
Fold3
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.92991	valid_1's rmse: 2.15281
Early stopping, best iteration is:
[33]	training's rmse: 1.99216	valid_1's rmse: 2.13129
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.18816	valid_1's rmse: 1.4134
[100]	training's rmse: 1.12732	valid_1's rmse: 1.41504
Early stopping, best iteration is:
[79]	training's rmse: 1.14949	valid_1's rmse: 1.3986
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.23428	valid_1's rmse: 2.32322
Early stopping, best iteration is:
[24]	training's rmse: 2.45077	valid_1's rmse: 2.28347
==================================================
Fold4
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.94071	valid_1's rmse: 2.30784
Early stopping, best iteration is:
[21]	training's rmse: 2.10633	valid_1's rmse: 2.21975
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.23878	valid_1's rmse: 1.53018
Early stopping, best iteration is:
[21]	training's rmse: 1.34462	valid_1's rmse: 1.5211
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.40883	valid_1's rmse: 2.43757
Early stopping, best iteration is:
[25]	training's rmse: 2.63874	valid_1's rmse: 2.41129
==================================================
Fold5
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.99928	valid_1's rmse: 2.54608
[100]	training's rmse: 1.88971	valid_1's rmse: 2.5805
Early stopping, best iteration is:
[58]	training's rmse: 1.9756	valid_1's rmse: 2.54204
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.26393	valid_1's rmse: 1.65844
Early stopping, best iteration is:
[32]	training's rmse: 1.31388	valid_1's rmse: 1.65244
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.39586	valid_1's rmse: 2.77717
Early stopping, best iteration is:
[45]	training's rmse: 2.42326	valid_1's rmse: 2.77194
==================================================
Fold6
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.2976	valid_1's rmse: 2.73089
Early stopping, best iteration is:
[46]	training's rmse: 2.31378	valid_1's rmse: 2.72048
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.49167	valid_1's rmse: 1.76899
Early stopping, best iteration is:
[37]	training's rmse: 1.53257	valid_1's rmse: 1.74993
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.77003	valid_1's rmse: 3.53105
Early stopping, best iteration is:
[31]	training's rmse: 2.92729	valid_1's rmse: 3.41573
==================================================
Fold7
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.45218	valid_1's rmse: 3.07434
[100]	training's rmse: 2.28086	valid_1's rmse: 3.09041
Early stopping, best iteration is:
[51]	training's rmse: 2.44595	valid_1's rmse: 3.06826
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.53217	valid_1's rmse: 1.7559
[100]	training's rmse: 1.43616	valid_1's rmse: 1.77069
Early stopping, best iteration is:
[75]	training's rmse: 1.47399	valid_1's rmse: 1.74786
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.83485	valid_1's rmse: 3.23384
[100]	training's rmse: 2.62827	valid_1's rmse: 3.28374
Early stopping, best iteration is:
[53]	training's rmse: 2.81564	valid_1's rmse: 3.23075
==================================================
Fold8
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.12318	valid_1's rmse: 2.74593
Early stopping, best iteration is:
[43]	training's rmse: 2.15151	valid_1's rmse: 2.74005
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.32624	valid_1's rmse: 1.80242
[100]	training's rmse: 1.24394	valid_1's rmse: 1.78931
[150]	training's rmse: 1.19371	valid_1's rmse: 1.79672
[200]	training's rmse: 1.15569	valid_1's rmse: 1.78935
[250]	training's rmse: 1.12313	valid_1's rmse: 1.77873
Early stopping, best iteration is:
[241]	training's rmse: 1.12847	valid_1's rmse: 1.77818
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.4031	valid_1's rmse: 2.66758
[100]	training's rmse: 2.24762	valid_1's rmse: 2.65095
Early stopping, best iteration is:
[65]	training's rmse: 2.34901	valid_1's rmse: 2.64111
==================================================
Fold9
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.00312	valid_1's rmse: 2.90628
[100]	training's rmse: 1.88741	valid_1's rmse: 2.91023
Early stopping, best iteration is:
[52]	training's rmse: 1.99609	valid_1's rmse: 2.90311
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.30322	valid_1's rmse: 1.69059
Early stopping, best iteration is:
[38]	training's rmse: 1.33652	valid_1's rmse: 1.67851
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.27465	valid_1's rmse: 3.11948
[100]	training's rmse: 2.11477	valid_1's rmse: 3.11475
Early stopping, best iteration is:
[75]	training's rmse: 2.18105	valid_1's rmse: 3.08221
==================================================
Fold10
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.0144	valid_1's rmse: 2.48373
[100]	training's rmse: 1.89467	valid_1's rmse: 2.52664
Early stopping, best iteration is:
[52]	training's rmse: 2.00812	valid_1's rmse: 2.47373
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.25143	valid_1's rmse: 1.98217
Early stopping, best iteration is:
[39]	training's rmse: 1.27678	valid_1's rmse: 1.97518
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.32292	valid_1's rmse: 2.94014
Early stopping, best iteration is:
[43]	training's rmse: 2.35201	valid_1's rmse: 2.92917
==================================================
Fold11
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.01619	valid_1's rmse: 2.68927
Early stopping, best iteration is:
[27]	training's rmse: 2.1272	valid_1's rmse: 2.65414
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.30433	valid_1's rmse: 2.20437
Early stopping, best iteration is:
[38]	training's rmse: 1.33683	valid_1's rmse: 2.19665
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.47304	valid_1's rmse: 2.92541
[100]	training's rmse: 2.27502	valid_1's rmse: 2.90876
[150]	training's rmse: 2.15335	valid_1's rmse: 2.9217
Early stopping, best iteration is:
[102]	training's rmse: 2.26897	valid_1's rmse: 2.90824
==================================================
Fold12
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.09307	valid_1's rmse: 2.80293
[100]	training's rmse: 1.96031	valid_1's rmse: 2.78816
[150]	training's rmse: 1.87903	valid_1's rmse: 2.77714
Early stopping, best iteration is:
[140]	training's rmse: 1.8951	valid_1's rmse: 2.76648
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.33556	valid_1's rmse: 2.20324
[100]	training's rmse: 1.25295	valid_1's rmse: 2.19244
[150]	training's rmse: 1.20025	valid_1's rmse: 2.19289
Early stopping, best iteration is:
[112]	training's rmse: 1.23794	valid_1's rmse: 2.18785
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.48866	valid_1's rmse: 3.83201
Early stopping, best iteration is:
[37]	training's rmse: 2.56546	valid_1's rmse: 3.79878
==================================================
Fold13
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.40123	valid_1's rmse: 3.07915
Early stopping, best iteration is:
[44]	training's rmse: 2.42847	valid_1's rmse: 3.04885
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.56169	valid_1's rmse: 2.32998
Early stopping, best iteration is:
[48]	training's rmse: 1.56968	valid_1's rmse: 2.32544
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.89674	valid_1's rmse: 3.4933
[100]	training's rmse: 2.67423	valid_1's rmse: 3.45366
[150]	training's rmse: 2.53754	valid_1's rmse: 3.46179
Early stopping, best iteration is:
[109]	training's rmse: 2.64655	valid_1's rmse: 3.44884
==================================================
Fold14
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.53322	valid_1's rmse: 2.86665
Early stopping, best iteration is:
[23]	training's rmse: 2.7369	valid_1's rmse: 2.83096
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.59422	valid_1's rmse: 2.45232
Early stopping, best iteration is:
[24]	training's rmse: 1.7362	valid_1's rmse: 2.4415
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.92108	valid_1's rmse: 3.17195
Early stopping, best iteration is:
[44]	training's rmse: 2.95701	valid_1's rmse: 3.15624
==================================================
Fold15
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.18037	valid_1's rmse: 2.87716
[100]	training's rmse: 2.02842	valid_1's rmse: 2.90554
Early stopping, best iteration is:
[51]	training's rmse: 2.17564	valid_1's rmse: 2.8663
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.39133	valid_1's rmse: 2.58833
[100]	training's rmse: 1.30333	valid_1's rmse: 2.58925
Early stopping, best iteration is:
[62]	training's rmse: 1.36522	valid_1's rmse: 2.58265
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.45382	valid_1's rmse: 3.12258
[100]	training's rmse: 2.30192	valid_1's rmse: 3.10233
Early stopping, best iteration is:
[68]	training's rmse: 2.39259	valid_1's rmse: 3.08434
==================================================
Fold16
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.07277	valid_1's rmse: 2.19371
Early stopping, best iteration is:
[44]	training's rmse: 2.09446	valid_1's rmse: 2.19202
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.33388	valid_1's rmse: 2.1891
Early stopping, best iteration is:
[49]	training's rmse: 1.33631	valid_1's rmse: 2.18909
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.35858	valid_1's rmse: 2.77349
[100]	training's rmse: 2.205	valid_1's rmse: 2.80319
Early stopping, best iteration is:
[55]	training's rmse: 2.33935	valid_1's rmse: 2.76327
==================================================
Fold17
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.05577	valid_1's rmse: 2.49512
[100]	training's rmse: 1.93504	valid_1's rmse: 2.48089
[150]	training's rmse: 1.85329	valid_1's rmse: 2.47977
Early stopping, best iteration is:
[114]	training's rmse: 1.9122	valid_1's rmse: 2.4721
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.30802	valid_1's rmse: 2.38507
[100]	training's rmse: 1.22912	valid_1's rmse: 2.3639
Early stopping, best iteration is:
[90]	training's rmse: 1.24093	valid_1's rmse: 2.35516
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.40555	valid_1's rmse: 3.20788
[100]	training's rmse: 2.2291	valid_1's rmse: 3.19593
[150]	training's rmse: 2.11801	valid_1's rmse: 3.17956
Early stopping, best iteration is:
[140]	training's rmse: 2.13594	valid_1's rmse: 3.15901
==================================================
Fold18
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.06847	valid_1's rmse: 2.47858
[100]	training's rmse: 1.94172	valid_1's rmse: 2.37994
[150]	training's rmse: 1.86232	valid_1's rmse: 2.42637
Early stopping, best iteration is:
[106]	training's rmse: 1.93381	valid_1's rmse: 2.37591
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.37038	valid_1's rmse: 2.33613
[100]	training's rmse: 1.2783	valid_1's rmse: 2.29612
[150]	training's rmse: 1.22472	valid_1's rmse: 2.30838
Early stopping, best iteration is:
[102]	training's rmse: 1.27561	valid_1's rmse: 2.29579
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.53012	valid_1's rmse: 3.0506
[100]	training's rmse: 2.32285	valid_1's rmse: 3.02838
[150]	training's rmse: 2.20084	valid_1's rmse: 3.0189
[200]	training's rmse: 2.10092	valid_1's rmse: 3.00749
[250]	training's rmse: 2.0284	valid_1's rmse: 3.00267
[300]	training's rmse: 1.9643	valid_1's rmse: 2.99297
[350]	training's rmse: 1.90852	valid_1's rmse: 2.99574
Early stopping, best iteration is:
[321]	training's rmse: 1.9391	valid_1's rmse: 2.98955
==================================================
Fold19
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.13439	valid_1's rmse: 2.48106
[100]	training's rmse: 2.00586	valid_1's rmse: 2.45791
[150]	training's rmse: 1.92423	valid_1's rmse: 2.44762
Early stopping, best iteration is:
[134]	training's rmse: 1.94542	valid_1's rmse: 2.44538
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.396	valid_1's rmse: 1.99721
Early stopping, best iteration is:
[23]	training's rmse: 1.50677	valid_1's rmse: 1.97257
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.56177	valid_1's rmse: 2.93108
[100]	training's rmse: 2.37016	valid_1's rmse: 2.97682
Early stopping, best iteration is:
[57]	training's rmse: 2.52791	valid_1's rmse: 2.92335
==================================================
Fold20
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.43762	valid_1's rmse: 2.9262
[100]	training's rmse: 2.28746	valid_1's rmse: 2.88299
[150]	training's rmse: 2.19483	valid_1's rmse: 2.88271
Early stopping, best iteration is:
[114]	training's rmse: 2.25727	valid_1's rmse: 2.87027
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.62416	valid_1's rmse: 2.76442
[100]	training's rmse: 1.51962	valid_1's rmse: 2.73572
[150]	training's rmse: 1.44932	valid_1's rmse: 2.72031
[200]	training's rmse: 1.40031	valid_1's rmse: 2.71311
[250]	training's rmse: 1.356	valid_1's rmse: 2.70591
[300]	training's rmse: 1.31488	valid_1's rmse: 2.69449
[350]	training's rmse: 1.28782	valid_1's rmse: 2.684
[400]	training's rmse: 1.25921	valid_1's rmse: 2.66609
[450]	training's rmse: 1.23733	valid_1's rmse: 2.66126
[500]	training's rmse: 1.21236	valid_1's rmse: 2.65779
[550]	training's rmse: 1.19041	valid_1's rmse: 2.65385
[600]	training's rmse: 1.17125	valid_1's rmse: 2.65401
[650]	training's rmse: 1.15384	valid_1's rmse: 2.65121
[700]	training's rmse: 1.13564	valid_1's rmse: 2.65167
Early stopping, best iteration is:
[678]	training's rmse: 1.14299	valid_1's rmse: 2.6431
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.97236	valid_1's rmse: 3.51138
[100]	training's rmse: 2.74117	valid_1's rmse: 3.505
[150]	training's rmse: 2.59998	valid_1's rmse: 3.55426
Early stopping, best iteration is:
[101]	training's rmse: 2.73817	valid_1's rmse: 3.50243
==================================================
Fold21
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.56888	valid_1's rmse: 3.51335
[100]	training's rmse: 2.39343	valid_1's rmse: 3.45126
Early stopping, best iteration is:
[81]	training's rmse: 2.44649	valid_1's rmse: 3.41334
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.6538	valid_1's rmse: 3.75042
[100]	training's rmse: 1.54585	valid_1's rmse: 3.70768
[150]	training's rmse: 1.47627	valid_1's rmse: 3.69304
[200]	training's rmse: 1.41848	valid_1's rmse: 3.68425
Early stopping, best iteration is:
[174]	training's rmse: 1.44425	valid_1's rmse: 3.67747
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.97894	valid_1's rmse: 3.90359
[100]	training's rmse: 2.76569	valid_1's rmse: 3.90122
Early stopping, best iteration is:
[77]	training's rmse: 2.8471	valid_1's rmse: 3.85982
==================================================
Fold22
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.19984	valid_1's rmse: 2.66975
Early stopping, best iteration is:
[34]	training's rmse: 2.27326	valid_1's rmse: 2.65704
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.45206	valid_1's rmse: 2.15255
[100]	training's rmse: 1.36275	valid_1's rmse: 2.09977
[150]	training's rmse: 1.30477	valid_1's rmse: 2.10324
Early stopping, best iteration is:
[102]	training's rmse: 1.36032	valid_1's rmse: 2.0981
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.5068	valid_1's rmse: 2.61681
Early stopping, best iteration is:
[49]	training's rmse: 2.51225	valid_1's rmse: 2.61511
==================================================
Fold23
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.09157	valid_1's rmse: 2.12855
Early stopping, best iteration is:
[28]	training's rmse: 2.19169	valid_1's rmse: 2.11306
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.38129	valid_1's rmse: 1.89752
[100]	training's rmse: 1.29344	valid_1's rmse: 1.86605
Early stopping, best iteration is:
[87]	training's rmse: 1.31452	valid_1's rmse: 1.86264
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.39086	valid_1's rmse: 2.72323
[100]	training's rmse: 2.23371	valid_1's rmse: 2.70187
[150]	training's rmse: 2.13659	valid_1's rmse: 2.68091
Early stopping, best iteration is:
[135]	training's rmse: 2.15985	valid_1's rmse: 2.67747
==================================================
Fold24
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.08416	valid_1's rmse: 2.1393
Early stopping, best iteration is:
[21]	training's rmse: 2.24408	valid_1's rmse: 2.06739
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.34486	valid_1's rmse: 1.54412
[100]	training's rmse: 1.26982	valid_1's rmse: 1.53083
[150]	training's rmse: 1.21533	valid_1's rmse: 1.52424
[200]	training's rmse: 1.1765	valid_1's rmse: 1.52306
Early stopping, best iteration is:
[153]	training's rmse: 1.21246	valid_1's rmse: 1.51909
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.42449	valid_1's rmse: 3.29609
[100]	training's rmse: 2.25784	valid_1's rmse: 3.34893
Early stopping, best iteration is:
[56]	training's rmse: 2.39872	valid_1's rmse: 3.29325
==================================================
Fold25
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.09116	valid_1's rmse: 2.27681
Early stopping, best iteration is:
[34]	training's rmse: 2.15077	valid_1's rmse: 2.26744
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.39597	valid_1's rmse: 1.58929
[100]	training's rmse: 1.30273	valid_1's rmse: 1.56382
[150]	training's rmse: 1.24825	valid_1's rmse: 1.56798
[200]	training's rmse: 1.20809	valid_1's rmse: 1.56197
Early stopping, best iteration is:
[167]	training's rmse: 1.23042	valid_1's rmse: 1.55858
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.55292	valid_1's rmse: 3.73835
Early stopping, best iteration is:
[21]	training's rmse: 2.91123	valid_1's rmse: 3.59039
==================================================
Fold26
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.16193	valid_1's rmse: 2.42305
Early stopping, best iteration is:
[38]	training's rmse: 2.21473	valid_1's rmse: 2.41895
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.41655	valid_1's rmse: 1.63681
[100]	training's rmse: 1.33415	valid_1's rmse: 1.64987
Early stopping, best iteration is:
[65]	training's rmse: 1.38551	valid_1's rmse: 1.63478
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.61596	valid_1's rmse: 2.83765
Early stopping, best iteration is:
[46]	training's rmse: 2.63443	valid_1's rmse: 2.81538
==================================================
Fold27
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.47983	valid_1's rmse: 2.68442
[100]	training's rmse: 2.32641	valid_1's rmse: 2.72566
Early stopping, best iteration is:
[57]	training's rmse: 2.4459	valid_1's rmse: 2.67965
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.66443	valid_1's rmse: 2.0189
Early stopping, best iteration is:
[27]	training's rmse: 1.77227	valid_1's rmse: 1.99616
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 3.04139	valid_1's rmse: 3.45164
[100]	training's rmse: 2.80087	valid_1's rmse: 3.50763
Early stopping, best iteration is:
[50]	training's rmse: 3.04139	valid_1's rmse: 3.45164
==================================================
Fold28
==================================================
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 2.61726	valid_1's rmse: 3.47666
[100]	training's rmse: 2.4388	valid_1's rmse: 3.34756
[150]	training's rmse: 2.32997	valid_1's rmse: 3.27322
[200]	training's rmse: 2.25262	valid_1's rmse: 3.24974
[250]	training's rmse: 2.18053	valid_1's rmse: 3.22894
Early stopping, best iteration is:
[248]	training's rmse: 2.18169	valid_1's rmse: 3.22727
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 1.75227	valid_1's rmse: 2.34698
[100]	training's rmse: 1.61494	valid_1's rmse: 2.31281
Early stopping, best iteration is:
[79]	training's rmse: 1.65982	valid_1's rmse: 2.30193
Training until validation scores don't improve for 50 rounds
[50]	training's rmse: 3.0598	valid_1's rmse: 3.22414
Early stopping, best iteration is:
[44]	training's rmse: 3.09419	valid_1's rmse: 3.22097
In [7]:
# Diagnose the model by investigating some of the predictions.
sales_clean = pd.read_csv("../input/m5-forecasting-accuracy/" + 'sales_train_evaluation.csv')
cal = pd.read_csv("../input/m5-forecasting-accuracy/calendar.csv")
prices = pd.read_csv("../input/m5-forecasting-accuracy/sell_prices.csv")
train_df = sales_clean.iloc[:, :-28] 

valid_df = sales_clean.iloc[:, -28:] # forecast for the last 28 days of train period
evaluator = w.WRMSSEEvaluator(train_df, valid_df, cal, prices)

df_val_pred.columns = ["d_"+ str(i) for i in range(1914,1942)]

# Calculate validation scores
evaluator.score(df_val_pred)
evaluator.print_more_scores()

# Thanks kaggle comunity for providing visualisation code
w.create_dashboard(evaluator, groups='all', cal=cal)
Overall:   0.6314   
Level 1:   0.4337  all_id 
Level 2:   0.4889  state_id 
Level 3:   0.5738  store_id 
Level 4:   0.5005  cat_id 
Level 5:   0.5755  dept_id 
Level 6:   0.5597  ['state_id', 'cat_id'] 
Level 7:   0.6273  ['state_id', 'dept_id'] 
Level 8:   0.6388  ['store_id', 'cat_id'] 
Level 9:   0.7012  ['store_id', 'dept_id'] 
Level 10:  0.8206  item_id 
Level 11:  0.8282  ['item_id', 'state_id'] 
Level 12:  0.8281  ['item_id', 'store_id']